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Abstract 

j  b 

In  this  paper  a  hierarchical  BayesianfffBj  model  is  adopted  to  derive  selection  pro¬ 
cedures  for  selecting  the/ best  of  k  binomial  parameters,  say  the  probability  of  success 
corresponding  to  k  diffei/ent  suppliers.  This  model  facilitates  the  use  of  prior  information 
in  the  analysis  for  both  /small  and  large  sample  sizes.  In  addition  to  computing  posterior 
probabilities  that  the  i**  supplier  is  best,  this  paper  presents  expressions  for  deciding  how 
much  better  a  given  supplier  is  relative  to  the  others.  Prior  information  is  assumed  to 
begin  with  exchangeability  and  can  be  more  informative  if  the  experimenter  has  other 
knowledge  about  the  suppliers  as  a  group.  A  numerical  example  is  given  and  the  paper 
concludes  with  remarks  about  future  work. 
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Hierarchical  Bayesian  Selection  Procedures 
for  the  Best  Binomial  Population 

1.  Introduction 

Suppose  there  are  k  suppliers  of  a  particular  item  and  a  sample  of  n,  items  is  taken 
from  the  ith  supplier  yielding  r,-  the  number  of  successes  (or  dcfecti.es)  in  the  sample. 
Then  X*  has  a  binomial  distribution  with  parameter  0,,  which  denotes  the  true  unkown 
probability  of  a  success  (failure)  from  the  tth  supplier.  The  best  supplier  is  defined  to 
be  the  one  with  largest  (smallest)  0,.  Based  on  the  observed  data  and  prior  information 
available,  we  seek  procedures  which  will  select  a  non-empty  subset  of  the  k  suppliers  and 
assert  with  some  confidence  that  the  best  supplier  is  amongst  those  in  the  selected  subset. 
In  this  paper  a  hierarchical  Bayesian  (HB)  model  is  adopted  and  the  behaviour  of  various 
selection  procedures  thus  obtained  is  studied.  This  application  to  binomial  data  parallels 
the  normal  means  problem  considered  by  Berger  and  Deely  (1988). 

The  problem  of  selecting  the  best  binomial  population  has  received  considerable  at¬ 
tention  in  the  literature  mainly  from  a  non-Bayesian  approach.  Pioneering  papers  by  Sobel 
and  Huyett  (1957)  and  Gupta  and  Sobel  (1960)  dealt  with  selecting  the  best  and  selecting 
a  subset  containing  the  best  binomial  population  respectively.  Later,  Gupta,  Huang  and 
Huang  (1976)  studied  a  conditional  subset  selection  rule  and  a  related  test  of  homogeneity. 
A  good  discussion  of  these  and  other  non-Bayesian  papers  can  be  found  in  books  by  Gib¬ 
bons,  Olkin  and  Sobel  (1977)  and  Gupta  and  Panchapakesan  (1979).  It  is  not  the  purpose 
of  this  paper  to  discuss  the  relative  merits  of  the  non-Bayesian  vs  Bayesian  approaches, 
but  we  believe  that  for  the  binomial  selection  problem  studied  in  this  paper,  the  Bayesian 
model  contains  the  facility  to  deal  easily  with  the  type  of  information  which  is  likely  to 
occur  in  practice  and  in  that  sense  offers  the  practitioner  a  more  appealing  model. 


There  have  been  some  Bayesian  and  empirical  Bayesian  papers  dealing  with  the  bi¬ 
nomial  selection  problem.  Deely  (1965)  developed  empirical  Bayes  procedures  for  gen¬ 
eral  selection  problems  including  among  them  the  binomial  case  with  independent  0*’ s, 
each  with  a  beta  prior  with  unknown  parameters.  Gupta  and  Liang  (1986)  derived  non- 
parametric  empirical  Bayes  procedures  for  selecting  the  best  binomial  population  under  the 
assumption  that  6\ are  independent  each  with  an  unknown  non-parametric  prior 
distribution.  Bratcher  and  Bland  (1975)  considered  a  naive  Bayesian  approach  in  which 
the  0,’s  are  independent  with  known  but  perhaps  different  beta  priors.  They  considered 
various  multiple  comparisons  based  on  computing  the  posterior  probabilities  of  each  pop¬ 
ulation  being  best  and  used  numerical  integration  to  calculate  these.  Later  Yang  (1987) 
applied  their  model  but  adopted  the  so  called  PP*  criterion,  which  had  been  previously 
introduced  for  a  general  selection  problem  by  Gupta  and  Yang  (1985).  This  criterion,  in 
an  effort  to  relate  the  Bayesian  criterion  to  the  classical  P*  condition,  states  that  the  Bayes 
P*  procedure  selects  the  smallest  subset  for  which  the  posterior  probability  that  the  subset 
contains  the  best  is  at  least  P*. 

There  have  been  other  relevant  papers  dealing  with  estimation  as  opposed  to  selection 
for  the  binomial  case.  Albert  (1984)  considers  the  simultaneous  estimation  of  k  binomial 
probabilities  and  develops  empirical  Bayes  estimators  under  an  exchangeable  hierarchical 
model.  Leonard  (1972)  also  considers  this  problem  but  uses  a  logit  transformation  to  bring 
the  problem  into  a  multivariate  normal  context.  A  lot  acceptance  problem  was  considered 
by  Eaves  (1980)  in  which  n  items  are  drawn  from  each  of  k  lots  under  binomial  sampling. 
An  exchangeable  hierarchical  model  is  assumed  and  the  predictive  distribution  for  the  next 
lot  is  computed  when  all  items  from  all  lots  are  good. 

A  related  problem,  that  of  allocating  the  observations  to  the  various  suppliers  con- 
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strained  by  the  fact  that  the  total  is  fixed,  has  received  some  attention  in  the  literature. 
Brooks  (1987)  deals  with  a  Bayesian  approach  for  k  =  2,  while  Brittain  and  Schlessel- 
man  (1982)  discuss  this  case  from  a  frequsntist  viewpoint  when  trying  to  estimate  p\  —  P2 
or  pi/p2-  These  problems  will  not  be  discussed  but  some  conclusions  about  allocation  can 
be  drawn  from  the  work  presented  here  and  these  will  be  discussed  in  Section  5. 

The  approach  in  this  paper  is  to  present  a  model  which  has  the  capability  cf  incor¬ 
porating  prior  information  concerning  the  suppliers  as  a  group  into  the  analysis.  The 
literature  to  date  while  recognizing  the  usefulness  of  such  prior  information  in  other  prob¬ 
lems  (see  for  example  Berger  (1985),  Chapter  3)  has  ignored  applying  such  models  to  the 
binomial  selection  problem.  The  hierarchical  Bayesian  model  is  one  way  this  can  be  done, 
easily  and  with  useful  results.  These  ideas  are  discussed  more  thoroughly  in  Section  3 
after  having  presented  the  mathematical  details  of  the  model  in  Section  2.  An  example 
illustrating  various  aspects  of  the  model  is  given  in  Section  4  with  concluding  remarks  and 
suggestions  for  further  work  given  in  Section  5. 

2.  Mathematical  details,  the  prior  distribution  and  selection  criteria 

Let  x  —  (xx, . . .  ,Xfc)  be  the  vector  of  observations  from  the  k  suppliers,  x,  conditional 
on  9{  having  the  binomial  distribution 


and  let  6_  =  (61,62,  ,0k)  be  the  vector  of  unknown  parameters  for  which  we  want  to 

select  that  supplier  with  largest  0,.  The  prior  distribution  7 r(£)  on  £  will  be  obtained 
via  the  hierarchical  Bayesian  structure  (see  Berger  (1985),  Section  4.6)  in  which  7r(£)  is 
given  as  a  mixture  of  a  prior  conditional  on  hyperparameters  (3  and  r\  with  a  hyperprior 
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distribution  on  these  parameters;  that  is, 


7r(£)  =  J  j  *{i\P,ri)h{P,r))dl3dr). 


Conditional  upon  the  hyperparameters  /?,r?,  the  components  of  0  are  assumed  to  be  i.i.d. 


with  a  common  beta  distribution  given  by 


where  0  <  /?  <  1,77  >  0;  thus 


n(9\P,’?)  =  II  ir(0i\0,  tj). 

i=  1 


This  partricular  form  of  the  beta  distribution  will  be  convenient  for  the  numerical  compu¬ 
tations  and  elicitation  of  prior  information.  These  topics  will  be  discussed  more  fully  in 


the  next  section.  Special  note  is  taken  here  that 


E(0i\P,  r))  =  (3  and  ct2  =  Var  {6i\0,r))  =  (1  -  /?)( 77/(77  +  1)) 


The  hyperparameters  0  and  77  willbe  taken  as  independent  so  that 


h(0,ri)  =  h !  (/?)  h2  (77 ) 


in  which  hi  will  be  some  member  of  the  beta  family  and  h2  as  some  member  of  the  family 


given  by 


h2{rj)  = 


0  <  77  <  4c/ (1  —  4c) 


+  l)m-2(4c/,r,  ,>40/(1 -4C) 


where  p  is  the  normalizing  constant,  c  and  m  are  parameters,  0  <  c  <  1/4  and  m  >  2. 


The  values  of  the  parameters  for  h\  and  h2  will  depend  upon  what  prior  information  is 
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available,  the  noninformative  exchangeable  case  being  hi  uniform  and  /12  having  c  =  1/4 
and  m  =  00. 

We  will  use  the  following  notation  for  the  beta  distribution: 

g{y\a,b)  =  £(a,fe)ya-1(  1  -  y)b_1 

B(a,b)  —  T(a  +  6)/(r(a)r(6)) 

G(t\a,b)  =  f  g[y\a,b)dy. 

Jo 

Under  the  notation  and  assumptions  above,  it  follows  that  the  conditional  distribution  of 
6  given  x,  0  and  77  is  given  by 


7r(0|x,/?,f?)  =  II  n(Oi\xi,0,r)) 

15=  1 


where 


„ra  1-  /a  ,  ,  * 

(0,l  ~  /(x.-f&i/)  “ 

f[xi\0,v)  =  f{xi\9i)*{9i\0,v)dOi=  (^j)  B{a,b)/B{aiybi) 

and  a  =  /?/ r? ,  6  =  (1  —  0)/y,  a,  =  a  +  x,-,  6,  =  6  +  y,  —  x,-.  Let 

*  r00  r1 

(2-5)  /(x|/?,y)  =  n  f{zi\0,ti)  and  /(x)  =  /  / 

‘-1  JO  jo 

Then  the  posterior  distribution  of  0  given  the  data  x  can  be  written  as 

(2.6)  „(2|x)  =  [°°  f  \(Ux,P,r,)MM>.hl(l3)h2Md0dr,. 

Jo  Jo  J\Z) 

In  fact  we  will  not  require  the  precise  form  of  |x)  since  decisions  about  which  supplier 
or  subset  of  suppliers  should  be  selected  will  be  based  on  easily  computed  expectations 
taken  with  respect  to  this  posterior.  We  now  develop  two  such  criteria. 

(i)  Posterior  probability  of  getting  the  best. 


Let 


Pj(b)  =  P(6j  >  6i  +  b  for  all  t  ^  j |z), 

where  b  €  [0, 1].  It  will  be  noted  that  py  (0)  is  just  the  posterior  probability  that  Oj  is  largest 
and  the  usual  PP*  selection  criterion  of  Gupta  and  Yang  is  obtained  by  putting  in  the 
selected  subset  the  smallest  number  of  suppliers  for  which  the  sum  of  their  corresponding 
py(0)’s  is  at  least  P*.  We  have  here  suggested  a  stronger  criterion  for  selection  purposes; 
one  that  allows  the  practitioner  to  express  a  quantity  b ,  i.e.  how  superior  does  the  best 
have  to  be,  and  a  probability  P*  to  be  attained  by  the  selected  subset.  Of  course  for  b  >  0 
it  is  no  longer  true  that  £py(6)  =  1  and  in  fact  it  may  be  that  for  a  given  b  >  0  no  supplier 
is  better  than  the  others  by  amount  b  with  high  enough  probability.  The  experimenter 
can  easily  take  another  look  and  perhaps  lower  b  or  the  probability  requirement.  In  any 
case  we  believe  the  p,(6)’ s  provide  a  useful  criterion  for  selecting  one  or  more  suppliers 
and  gives  the  experimenter  the  Interpretation  which  relates  to  the  practical  problem. 

Using  (2.1)  and  letting  A:  (b)  =  {0  :  Oj  >  0,  +  b  for  all  i  /  j},  the  expression  for  Pj(b) 
can  be  derived  as  follows: 

pAb )  =  /  ir(0\xjdO 

A,(b) 

=  /  f  \  *(£|  x,P,Tj)d0  ^f~-hl{P)h2{ri)dpdr] 

Jo  Jq  J  f[2L) 

A}(b) 

oo  i  r  i  i 

(2.7)  -  r  f  [  n  G(^-6|ai,6l)(?(0;|ay,6J)d0y  lMMthl{p)h2{r,)dPdr] 

Jo  Jo  \J o  ■:=*.  /U 

noting  that  the  terms  in  brackets  are  equal,  a,  and  6,  being  defined  earlier.  Thus  evaluation 
of  each  p;(&)  requires  only  a  t.brpp  dirner=icnr.'  numerical  integration  h,i  all  choices  of  h \ 
and  h2,  provided  the  incomplete  beta  function  is  available. 
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(ii)  Expected  number  of  future  successes. 


Another  useful  criterion  for  selection  purposes  is  obtained  by  considering  a  future 
observation.  Suppose  n  future  observations  are  to  be  taken  from  one  of  the  suppliers.  Let 
the  total  number  of  successes  be  denoted  by  Y  and  compute  E(Yx )  for  each  supplier  t  = 
1, . . . ,  jfc,  where  E  is  the  expection  taken  with  respect  to  the  distribution  of  Y  conditional 
on  x,  i.e.  the  predictive  distribution.  Then  the  supplier  with  largest  E(Yt)  is  called  best. 
Calculation  for  E(Y{)  is  easily  obtained  as 

E{YX)  =  E{Yi\x)  =  \  £(FiM,M0,|x)d0, 

Jo 

—  I  n0,7r(0,|:r)<i0, 

Jo 

=  nE(di\x). 

Thus  ranking  suppliers  on  the  basis  of  largest  expectation  of  the  number  of  successes  in  n 
future  items  is  equivalent  to  ranking  them  on  the  basis  of  their  posterior  means  based  on 
the  present  data  x.  An  expression  for  2£(0,|x)  involves  only  a  two  dimensional  integration 
and  is  given  by: 


(2.8)  E{0t\x)  =  I  0i*{9i\x)dOi 


_  r°°  f1  P  +  yni  ( -CtY 

Jo  Jo  L1  +  r?nt  1  +  T?n,\rii/. 


fU) 

f{z\P,v) 
fix) 


hi(P)h2{r])dPdr] 


hi{P)h2(T])dPdii 


using  (2.4),  (2.5)  and  noting  that  the  mean  of  <7(0, Icq, 6,)  is  o,/(at  +  6,). 


Using  the  posterior  means  to  rank  the  suppliers,  an  appropriate  decision  about  which 
subset  of  suppliers  to  select  can  be  made,  e.g.  put  in  the  selected  subset  supplier  t  if  and 
only  if  E{6x\x)  >  c.  This  selection  procedure  assures  the  decision  maker  that  each  of  the 
suppliers  thus  selected  will  have  the  expected  number  of  successes  at  least  as  large  as  nc. 


Alternatively,  if  one  selects  a  subset  of  size  r,  r  <  k,  based  on  the  r  largest  E(6j |x),  then 
the  expected  number  of  successes  for  that  subset  is  larger  than  that  for  any  other  subset 
of  size  r.  Further  amplification  of  this  point  will  be  made  in  Section  5.  We  now  turn  our 
attention  to  the  various  choices  for  h\  and  /12  and  discuss  the  influence  of  prior  information 
on  these  choices. 

3.  Prior  information  and  elicitation  for  h\  and  h? 

There  are  two  main  advantages  of  the  seemingly  complicated  hierarchical  structure. 
Firstly,  it  provides  a  realistic  Bayesian  model  which  can  easily  accommodate  the  type  of 
prior  information  which  is  likely  to  be  available;  secondly,  it  is  the  appropriate  model  for 
what  is  commonly  called  the  parametric  empirical  Bayes  approach  (see  Morris  (1983)).  In 
the  particular  application  made  herein  to  supplier’s  data,  it  is  clear  that  there  is  some  prior 
information  concerning  the  suppliers  as  a  group,  i.e.  approximately  where  their  quality  is 
likely  to  be  and  what  sort  of  variability  amongst  the  6{’ s  can  be  expected.  But  if  this 
kind  of  information  is  unavailable,  then  it  is  still  sensible  to  treat  the  0j’s  as  exchangeable 
with  noninformative  hyperpriors.  Both  of  these  ideas  are  covered  in  the  HB  model.  This 
type  of  prior  information  is  to  be  contrasted  to  those  Bayesian  models  which  assume  the 
Oi's  are  independent  with  known  but  perhaps  different  distributions.  This  approach  is 
generally  quite  unrealistic  and  therefore  has  limited  application.  On  the  other  hand  it  is 
sometimes  argued  that  a  prior  distribution  on  tho  tq-’s  exists  but  is  unknown.  When  this 
prior  is  assumed  to  be  in  some  parametric  family,  it  is  then  suggested  that  repetitions  of 
the  process  may  yield  estimates  of  these  parameters.  Acting  as  though  these  estimates 
w®re  the  true  unknown  parameter  values,  one  can  then  use  the  Bayes  procedures,  hence 
the  expression  parametric  empirical  Bayes.  What  estimators  are  sensible  in  this  context  is 
generally  answered  by  embedding  the  unknowns  in  a  larger  truly  Bayesian  model,  hence 
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the  incorporation  of  hyperpriors  and  the  expression  Bayes  empirical  Bayes,  (see  Deely  and 
Lindley  (1981)).  Often  such  truly  Bayesian  models  yield  complicated  numerical  problems 
related  to  the  form  of  the  posterior  distribution,  and  in  this  case  some  form  of  Bayesian 
estimation  is  required.  Again  the  HB  model  provides  a  structure  within  which  sensible 
estimates  are  easily  obtained,  but  we  point  out  that  the  particular  problem  treated  in  this 
paper,  no  such  estimators  are  required  since  Pj(b)  in  (2.7)  and  £(0,|x)  in  (2.8)  are  easily 
computed. 

We  now  discuss  the  choices  for  h \  and  h 2  and  their  relationship  to  the  form  of  the 
prior  information  available. 

Case  1  :  Exchangeable  and  noninformative. 

In  the  situation  in  which  practically  nothing  is  known  a  priori  about  the  suppliers  with 
respect  to  0,  it  is  reasonable  to  assume  that  61, . . . ,  $k  are  exchangeable  random  variables. 
Any  prior  distribution  obtained  via  a  mixture  implies  exchangeability  and  so  in  particular 
the  structure  given  in  the  previous  section  insures  exchangeability  for  any  choice  of  hi  and 
h2.  Consistent  with  the  absence  of  prior  information  is  the  assumption  of  noninformative 
hyperpriors.  Since  0  <  0  <  1  we  can  take  a  noninformative  choice  for  h\  as  the  uniform 
distribution.  For  /i2  we  argue  that  since  0  <  Var  [000,  tj)  <  j  for  all  0  we  deduce  a 
noninformative  choice  for  h2  by  putting  a  uniform  distribution  on  the  variance  over  (0,  £). 
This  gives 

M*)  =  <rJ  <00. 

It  could  be  suggested  that  a  simple  noninformative  choice  for  h2  would  be  h2(r/)  =  1.  This 
has  been  used  in  the  example  in  the  next  section  for  comparison  purposes  but  in  the  special 
case  in  which  the  nth  component  of  the  data  vector  x  is  either  0  or  n,  for  all  t  =  1,2, ...  ,k, 


uu.'.v'it.sLWi:' 


the  improper  /12  (*7)  =  1  does  not  yield  a  proper  posterior.  Also  since  the  elicited  prior 
information  will  concern  the  conditional  mean  and  variance  it  will  be  more  convenient  to 
think  of  hyperpriors  induced  via  this  information  or  lack  of  it. 


Case  2  :  Prior  information  available. 


It  could  be  the  case  that  some  decision  makers  may  have  enough  prior  information  to 
specify  precise  values  for  /3  and  t]  in  n(0 |/3,  r 7),  that  is,  select  a  particular  beta  distribution  as 
a  prior  distribution  for  0 1, . . .  ,0k ■  In  fact  some  parametric  empirical  Bayes  models  assume 
each  6i  is  independently  generated  from  a  particular  prior  with  unknown  parameters  /3, 
and  77,.  However  it  has  been  recognized  that  this  is  a  rather  naive  view  of  Bayesian  models 
and  that  the  notion  of  exchangeability  amongst  the  0,’s  is  a  more  realistic  approach,  (see, 
for  example,  Berger  (1985),  Chapter  4).  Our  approach  here  is  to  consider  prior  information 
arising  from  eliciting  answers  from  the  practitioner  to  the  following  questions: 

(1)  Where  do  you  expect  the  average  of  the  0,’s  to  lie,  i.e.  can  you  specify  an  interval, 
say  (sj,tx),  within  which  you  are  confident  that  the  average  of  the  0{S  will  lie? 

(2)  How  variable  do  you  consider  the  0,’s  to  be;  that  is,  can  you  specify  an  interval, 
say  (62, <2)1  w*thin  which  you  are  confident  all  of  the  0,’s  must  lie? 

Answering  the  first  question  will  determine  h:  {(3)  as  a  member  of  the  beta  distribution 
whose  mean  is  taken  as  the  midpoint  of  the  interval  (sj,£i)  and  variance  as  [(fi  —  >s  1 ) /4] 2 . 
This  choice  is  influenced  by  convenience  but  it  is  consistent  with  the  elicited  information 
while  also  allowing  a  small  probability  that  the  mean  of  the  0,' s  is  outside  the  interval 
specified  by  the  experimenter.  Computation  of  h i(0\x)  could  be  used  to  assess  the  exper¬ 
imenters  original  judgment. 

We  will  use  the  answer  to  the  second  question  to  determine  h2(T])  by  firstly  using 


■w 


in. 

► 


this  information  to  obtain  an  appropriate  distribution  on  <72,  the  conditional  variance  of 
6i  given  0  and  rj.  Since  the  elicited  information  expresses  an  upper  bound,  say  c,  on  o2 
over  all  0  and  77,  we  take  this  to  imply  a  flat  distribution  on  the  interval  (0,  c).  However 
we  allow  the  possibility  that  the  variance  could  exceed  this  value  but  with  a  distribution 
that  decays  exponentially  to  1/4.  Note  that  it  is  always  the  case  that  0  <  a2  <  1/4.  This 
distribution  is  called  the  ‘shoe’  distribution  and  is  given  by 


s(u)  = 


m—  1 
pmc  ’ 

m—  1  (  c  \  m 


0  <  U  <  C 


(u)  ’  c<u<? 


where  p  —  1  —  (4 c)m-1/m  is  the  normalizing  constant,  c  will  be  taken  as  c  =  [(<2  —  S2)/4]2 
and  m  is  chosen  so  that  the  P(0  <  o2  <  c)  describes  the  confidence  of  the  practitioner. 
Observe  that  P(0  <  a2  <  c)  =  so  the  determination  of  m  is  „,aightforward.  From 

this  distribution  on  ct2  and  using  the  transformation  a1  —  '//4(tj  +  1)  it  is  easy  to  obtain 
hi  as  given  in  (2.3). 

These  hyperpriors  will  be  used  in  a  numerical  example  in  Section  4  to  show  the  effect 
on  the  selection  criteria.  We  remark  that  other  hyperpriors  satisfying  the  information 
elicited  were  used  but  did  not  have  much  effect  on  py(6)  or  E(0j\x). 

Numerical  Examples 

In  this  section  we  study  the  effect  of  the  hyperpriors  and  give  examples  of  the  relevant 
computations. 

(i)  Effect  of  sample  size  on  Py(6) 

The  table  below  compares  the  values  of  p;  (6)  when  the  sample  size  changes  from  10 
to  20  for  both  the  noninformative  and  informative  cases. 
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Table  1  -  Values  of  p;(6);  top  figures  for  rii  —  10;  figures  in  (  )  for  n,  =  20. 

Noninformative  Informative 

Xi  b  b 

Supplier  ^  0  .05  .10  0  |~05  .10 

1  .1  .006  .002  .001  '  .014  .006  .002 

(.000)  (.000)  (.000)  (.001)  (.000)  (.000) 

2  .4  .194  .136  .088  '  .217  .144  .088 

(.110)  (.060)  (.029)  (.120)  (.063)  (.030) 

3  .6  .800  .720  .628  '  .768  .672  .564 

_  (.894)  (.817)  (.713)  |  (.868)  (.777)  (.667) 

These  figures  show  that  as  the  sample  size  increases,  Pj(b)  for  the  largest  sample 
proportion  increases  but  the  difference  from  noninformative  to  informative  is  not  very 
large  in  either  case.  The  noninformative  hyperpriors  were  as  in  Case  1  of  Section  3.  For 
the  informative  case,  hypothetical  answers  to  Questions  1  and  2  as  discussed  in  Case  2  of 
Section  3  were  taken  respectively  as: 

(l)  (si,*i)  =  (0.3, 0.5)  and  thus  h\  was  taken  as  g(/?|38,57)  where  the  parameters 
were  taken  as  the  solutions  to 


.5)  _  a  /  .5  -  ,3\  2  _  _ 
a  +  b  \  4  /  (a 


+  6)2(a  +  6  +  1)  ’ 


(2)  ($2  >*2)  =  (0.1, 0.6)  and  thus  h2  is  given  by  (2.3)  with  c  =  — ~  =  .125  and  m  =  4. 

Other  choices  for  these  informative  hyperpriors  were  made  with  very  little  effect  on  the 
PjW- 

(ii)  All  sample  proportions  equal  but  unequal  sample  sizes 

A  useful  feature  of  the  hierarchical  model  is  its  ability  to  deal  with  equal  proportions 
from  the  k  suppliers  when  the  sample  sizes  are  in  fact  different.  It  seems  to  be  the  case 
that  the  smallest  sample  size  always  has  the  largest  p;(0).  The  table  below  shows  the 
computations  for  p;(0)  when  h\  and  h2  are  the  noninformative  choices.  Other  hyperpriors 


did  not  result  in  much  change  in  py(0). 

Table  2  -  Values  of  py( 0)  and  E(0,  |i);  Equal  proportions  but  unequal  sample  sizes. 


3 

ny 

ii 

to 

^  =  A 

ru 

^  =  .45 

n. _ 

nt 

^  =  .45 

Tli 

1 

10 

.384 

.375 

(.407) 

20 

.374 

40 

.351 

(.45) 

2 

20 

.323 

.325 

(.405) 

40 

.325 

60 

.335 

(.45) 

3 

30 

.296 

.302 

(.404) 

60 

.302 

80 

.326 

(.45) 

The  situation  is  quite  different  when  using  the  posterior  mean  as  the  selection  criterion. 
The  numbers  in  parantheses  are  the  corresponding  posterior  means  and  clearly  provide 
no  discrimination.  This  is  to  be  expected  since  E(6j\x)  in  (2.8)  is  seen  to  be  a  convex 
combination  of  the  sample  proportion  and  the  posterior  mean  of  (3.  But  /?  is  centered 
around  the  average  of  the  proportions;  hence  the  convex  combination  of  the  two  will  be 
very  nearly  the  one  value  of  sample  proportions.  Again  very  little  effect  was  obtained  by 
using  informative  hyperpriors. 

(iii)  Unequal  sample  sizes  and  unequal  proportions 

The  striking  advantage  of  the  hierarchical  model  is  best  displayed  when  dealing  with 
variable  sample  sizes  and  unequal  proportions.  The  table  below  indicates  the  type  of 
computations  possible  for  this  model. 

Table  3  -  Values  of  pAb)  and  E(0j\x);hi,h,2  noninformative. 


2  13  4  5  6  7  8 


18  I  19  21  23  16  20  22  |  17 


3  3  3  3  2  2  2 


.167  .156  .143  .130  .125  .100  .091  .059 


.175  .167  .153  .141  .140  .116  .107  .084 


34 
62 

26  1.008  1.005 


.244  .2 

.126  .1 
.057  .044 


For  this  data  we  could  have  a  posterior  probability  of  0.741  of  getting  the  best  in  the  subset 
of  suppliers  1,  2,  3,  5.  Note  that  when  using  the  py(6)  criterion  supplier  5  is  preferred  to 
supplier  4  even  though  the  sample  proportion  is  in  the  reverse  order.  When  using  E(6j\x ) 
this  is  not  the  case.  This  is  reasonable  in  that  the  two  criteria  represent  radically  different 
goals  for  the  experimenter.  The  Pj{b)  criterion  should  be  used  when  a  decision  will  be  used 
for  a  long  term  and  E(9j |x)  should  be  used  for  the  next  lot.  Further  amplification  of  this 
point  will  be  made  later. 


5.  Remarks,  discussion  and  conclusions 


(i)  Test  of  hypothesis 

There  may  be  some  situations  in  which  a  decision  maker  is  concerned  in  the  first 
instance  about  testing  the  equality  of  the  supplier’s  quality,  i.e.  test  Ho  :  Oi  —  . . .  —  &k- 
Whereas  we  feel  that  this  is  not  in  general  the  ultimate  goal  of  the  experimenter,  it  is 
quite  easy  to  incorporate  this  situation  into  the  model  by  simply  incorporating  a  prior 
probability  7  that  Ho  is  true  (i.e.  P(H0  is  true)  =  P[r)  =  0)  =  7)  and  then  computing  the 
posterior  probability  of  Hq  which  is  given  by: 


L  1  /(*l°). 

where  f(x)  and  f(x)\(3, p)  are  given  in  (2.5)  and 

/{S.\0)=  f  f(x\0,O)hl(0)d0=  f  A  (ni)0Eti(l-0)N-Etihi (0)d0. 

JO  Jo  *=1  \xi/ 

Then  each  py  should  be  multiplied  by  (1  —  7*)  to  obtain  the  posterior  probability  that  0}  is 
largest  since  py  as  given  in  (2.5)  is  conditional  upon  Ho  false,  i.e.  p  >  0.  One  could  simply 
compute  the  Bayes  factor,  DF  =  /(x|0)//(x),  as  evidence  for  believing  Ho-  We  point  out 
however  that  the  model  of  Deely  and  Zimmer  (1987)  seems  more  appropriate  for  testing 
the  equality  of  supplier’s  quality. 


15 


(ii)  Comparisons  and  possible  extensions 

It  is  clear  that  the  HB  model  offers  a  much  wider  class  of  models  than  the  naive 
Bayesian  or  the  empirical  Bayesian  approaches  which  have  been  reported  in  the  literature 
thus  far.  In  the  first  instance,  the  HB  model  allows  through  the  hyperpriors  h  i  and  h2 
the  facility  to  use  prior  information  about  the  suppliers  as  a  group  whereas  the  naive 
models  have  no  place  for  such  information.  We  believe  that  this  prior  information  begins 
with  an  assumption  of  at  least  exchangeability,  but  more  informative  models  are  also 
possible  as  we  have  shown  in  the  examples  in  Section  4.  One  could  argue  that  some 
approximations  of  the  pj' s  or  2£(0,|x)’s  might  be  close  enough  and  not  require  numerical 
integration.  There  has  been  some  work  in  this  direction  (see  Albert  and  Gupta  (1985)  and 
Leonard  (1972))  but  since  the  numerical  integrations  required  herein  are  relatively  easy 
such  approximations  would  appear  to  be  unnecessary.  Secondly,  we  point  out  that  only  a 
very  simple  hierarchical  model  was  used  in  this  paper.  It  is  clear  that  there  is  scope  for 
richer  models.  For  example,  one  could  replace  0  in  (2.1)  with  yn0\  -F  yi202  where  y»i , y*2 
are  known  “regressors”  for  z  =  1, . . . ,  k  and  0_  =  (01,  fa)  is  a  vector  of  unknown  “regression” 
coefficients  with  hyperprior  h\(0).  This  model  would  incorporate  various  descriptions  of 
changes  in  0,  as  well  as  the  naive  Bayesian  model  in  which  each  0,  is  assumed  independent 
with  a  known  beta  distribution  possibly  with  different  parameters.  This  latter  case  would 
be  modeled  by  taking  h\  and  h2  as  point  distributions  at  (1,  1)  and  1  respectively  and 
then  solving  for  yn  and  y*2  to  obtain  the  given  known  beta  parameters. 

Another  possible  extension  of  the  HB  model  would  involve  covering  partial  exchange¬ 
ability  particularly  relevant  when  k  is  large.  In  this  paper  we  have  discussed  analysis  when 
k  is  small  and  have  tacitly  assumed  all  k  binomial  probabilities  are  exchangeable.  It  may 
be  the  case  that,  in  a  large  group  of  suppliers,  exchangeability  is  only  tenable  within  sub- 


groups  and  from  subgroup  to  subgroup  there  may  be  exchangeability  only  in  their  means. 
Of  course  this  fact  may  not  be  recognizable  until  after  observing  the  data.  The  HB  model 
should  be  enriched  to  allow  the  possibility  of  partial  exchangeability  being  indicated  by 
the  data  and  then  proceeding  with  the  selection  problem. 

Finally  it  should  be  noted  that  the  HB  model  has  no  difficulty  with  either  small 
or  variable  sample  sizes  whereas  naive  empirical  Bayes  procedures  require  large  sample 
sizes  to  imply  their  optimality  properties.  In  addition  these  models  cannot  give  practical 
answers  to  allocation  of  small  samples  amongst  suppliers.  In  contrast  the  formulas  for 
Pj{b)  or  E(0j\x )  developed  herein  can  be  used  to  generate  a  matrix  of  possibilities  over 
a  grid  of  varying  small  samples.  The  experimenter  is  then  given  tangible  information 
by  which  a  satisfactory  design  can  be  selected.  There  has  been  very  little  work  done  in 
this  area.  Recently,  Yang  (1988),  has  given  sufficient  conditions  for  Pi(0)  <  py(0)  as  a 
function  of  n  and  xy.  He  showed  that  if  Xj  —  x,  >  max(0, ny  —  nt)  then  py( 0)  >  p,(0). 
Although  this  condition  is  useful,  it  does  not  completely  partition  the  (x,,  Xy)  space  and 
in  fact  when  ny  —  n,  is  large  there  are  many  possibilities  for  x,  and  xy  which  do  not  satisfy 
Yang’s  condition.  In  particular  the  region  where  (x,/ni)  =  (xy/ny)  (or  nearly  so)  does 
not  in  general  satisfy  this  condition.  Our  numerical  results  seem  to  indicate  that  over  this 
region  the  smaller  sample  size  gives  the  larger  py(0);  but  this  remains  to  be  demonstrated 
completely. 

(iii)  Differences  in  selection  criteria 

It  has  been  proposed  in  this  paper  that  either  the  py( 6)’s  or  the  2?(0y  |x)’s  be  used  for 
selection  purposes.  Which  to  use  will  depend  on  the  requirements  of  the  practical  situation. 
If  a  decision  is  to  be  made,  say  contracting  with  the  selected  suppliers  for  delivery  of  items 
over  a  period  of  time,  then  p;(6)  should  be  used  for  either  selecting  the  best  or  selecting 


the  smallest  subset  for  which  the  posterior  probability  that  fhe  largest  (by  amount  b )  9j 
is  in  the  selected  subset  is  at  least  P* ,  i.e.  the  PP*  rule.  If,  however,  a  decision  for  the 


short  term  is  to  be  made,  say  which  machine  to  use  for  the  next  n  items,  then  E(0j\x )  is 
more  appropriate.  To  select  a  subset  using  this  criterion,  the  requirement  could  be  either 
to  insure  that  the  expected  number  of  successes  is  at  least  N*  (i.e.  take  c  =  N* /n  in 
Section  2  (ii))  or  to  maximize  the  expected  number  of  successes  from  a  fixed  number  r  of 
the  k  suppliers,  r  <  k. 

It  should  be  noted  that  if  a  decision  theoretic  approach  is  taken  for  the  subset  selection 
problem,  there  is  no  known  loss  function  which  gives  as  the  Bayes  procedure  the  PP * 
rule.  Gupta  and  Yang  (1985)  give  general  conditions  which  must  be  satisfied  by  the  loss 
function  in  order  that  the  PP*  is  Bayes  amongst  the  restricted  class  of  rules  satisfying  the 
P*  condition.  If  the  decision  problem  is  formulated  as  selecting  a  subset  of  fixed  size  then 
the  procedure  discussed  above  based  on  E(0j\$)  is  Bayes  with  respect  to,  say 

L{Sr,S)  =  k9 [*i-  £  0i 
iesr 

where  Sr  ranges  over  subsets  of  size  r.  However  the  procedure  which  insures  the  expected 
number  of  successes  is  at  least  N*  has  not  yet  been  shown  to  be  a  Bayes  procedure  in  the 
decision  theoretic  sense. 

It  should  also  be  noted  that  the  two  criteria  can  lead  to  different  subsets  being  selected, 
as  shown  in  Section  4(iii).  This  is  even  true  when  only  a  single  supplier  is  to  be  selected. 
This  is  not  surprising  since  the  two  criteria  clearly  have  different  objectives  as  discussed 
earlier.  Furthermore  the  Pj(b)  calculation  depends  on  the  variance  as  well  as  the  mean  so 
when  sample  sizes  are  quite  different  but  proportions  similar,  it  is  to  be  expected  that  the 
largest  Pj(b)  does  not  correspond  to  the  largest  E(9j\x ). 
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(iv)  Conclusions 

We  have  tried  to  show  why  the  HB  model  is  helpful  to  the  practical  problem  of 
selecting  the  best  amongst  k  binomial  populations.  The  salient  features  of  this  approach 


(i)  the  ability  to  deal  easily  with  variable  and  small  saxnple  sizes; 

(ii)  the  incorporation  in  the  model  of  prior  information  concerning  the  suppliers  as  a 
group; 

(iii)  the  ease  of  computation  of  the  selection  criteria; 

(iv)  the  dependence  of  the  optimality  qualities  upon  differences  in  the  observations  as 
opposed  to  differences  in  the  unobserved  parameter  space. 

Further  work  remains  to  be  done  to  make  these  techniques  available  to  the  experi¬ 
menter,  but  we  hope  we  have  made  some  progress  in  that  direction. 


i  \  •_  ■  ■  %  v  %  *« 
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